185 research outputs found

    A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences.</p> <p>Results</p> <p>In this paper, a novel building block of proteins called Top-<it>n</it>-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-<it>n</it>-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-<it>n</it>-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-<it>n</it>-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-<it>n</it>-grams and LSA gives significantly better results compared to related methods.</p> <p>Conclusion</p> <p>The method based on Top-<it>n</it>-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-<it>n</it>-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.</p

    Word correlation matrices for protein sequence analysis and remote homology detection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Classification of protein sequences is a central problem in computational biology. Currently, among computational methods discriminative kernel-based approaches provide the most accurate results. However, kernel-based methods often lack an interpretable model for analysis of discriminative sequence features, and predictions on new sequences usually are computationally expensive.</p> <p>Results</p> <p>In this work we present a novel kernel for protein sequences based on average word similarity between two sequences. We show that this kernel gives rise to a feature space that allows analysis of discriminative features and fast classification of new sequences. We demonstrate the performance of our approach on a widely-used benchmark setup for protein remote homology detection.</p> <p>Conclusion</p> <p>Our word correlation approach provides highly competitive performance as compared with state-of-the-art methods for protein remote homology detection. The learned model is interpretable in terms of biologically meaningful features. In particular, analysis of discriminative words allows the identification of characteristic regions in biological sequences. Because of its high computational efficiency, our method can be applied to ranking of potential homologs in large databases.</p

    Exploiting residue-level and profile-level interface propensities for usage in binding sites prediction of proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recognition of binding sites in proteins is a direct computational approach to the characterization of proteins in terms of biological and biochemical function. Residue preferences have been widely used in many studies but the results are often not satisfactory. Although different amino acid compositions among the interaction sites of different complexes have been observed, such differences have not been integrated into the prediction process. Furthermore, the evolution information has not been exploited to achieve a more powerful propensity.</p> <p>Result</p> <p>In this study, the residue interface propensities of four kinds of complexes (homo-permanent complexes, homo-transient complexes, hetero-permanent complexes and hetero-transient complexes) are investigated. These propensities, combined with sequence profiles and accessible surface areas, are inputted to the support vector machine for the prediction of protein binding sites. Such propensities are further improved by taking evolutional information into consideration, which results in a class of novel propensities at the profile level, i.e. the binary profiles interface propensities. Experiment is performed on the 1139 non-redundant protein chains. Although different residue interface propensities among different complexes are observed, the improvement of the classifier with residue interface propensities can be negligible in comparison with that without propensities. The binary profile interface propensities can significantly improve the performance of binding sites prediction by about ten percent in term of both precision and recall.</p> <p>Conclusion</p> <p>Although there are minor differences among the four kinds of complexes, the residue interface propensities cannot provide efficient discrimination for the complicated interfaces of proteins. The binary profile interface propensities can significantly improve the performance of binding sites prediction of protein, which indicates that the propensities at the profile level are more accurate than those at the residue level.</p

    Subfamily specific conservation profiles for proteins based on n-gram patterns

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A new algorithm has been developed for generating conservation profiles that reflect the evolutionary history of the subfamily associated with a query sequence. It is based on n-gram patterns (NP{<it>n,m</it>}) which are sets of <it>n </it>residues and <it>m </it>wildcards in windows of size <it>n+m</it>. The generation of conservation profiles is treated as a signal-to-noise problem where the signal is the count of n-gram patterns in target sequences that are similar to the query sequence and the noise is the count over all target sequences. The signal is differentiated from the noise by applying singular value decomposition to sets of target sequences rank ordered by similarity with respect to the query.</p> <p>Results</p> <p>The new algorithm was used to construct 4,248 profiles from 120 randomly selected Pfam-A families. These were compared to profiles generated from multiple alignments using the consensus approach. The two profiles were similar whenever the subfamily associated with the query sequence was well represented in the multiple alignment. It was possible to construct subfamily specific conservation profiles using the new algorithm for subfamilies with as few as five members. The speed of the new algorithm was comparable to the multiple alignment approach.</p> <p>Conclusion</p> <p>Subfamily specific conservation profiles can be generated by the new algorithm without aprioi knowledge of family relationships or domain architecture. This is useful when the subfamily contains multiple domains with different levels of representation in protein databases. It may also be applicable when the subfamily sample size is too small for the multiple alignment approach.</p

    Observation of a ppb mass threshoud enhancement in \psi^\prime\to\pi^+\pi^-J/\psi(J/\psi\to\gamma p\bar{p}) decay

    Full text link
    The decay channel ψπ+πJ/ψ(J/ψγppˉ)\psi^\prime\to\pi^+\pi^-J/\psi(J/\psi\to\gamma p\bar{p}) is studied using a sample of 1.06×1081.06\times 10^8 ψ\psi^\prime events collected by the BESIII experiment at BEPCII. A strong enhancement at threshold is observed in the ppˉp\bar{p} invariant mass spectrum. The enhancement can be fit with an SS-wave Breit-Wigner resonance function with a resulting peak mass of M=186113+6(stat)26+7(syst)MeV/c2M=1861^{+6}_{-13} {\rm (stat)}^{+7}_{-26} {\rm (syst)} {\rm MeV/}c^2 and a narrow width that is Γ<38MeV/c2\Gamma<38 {\rm MeV/}c^2 at the 90% confidence level. These results are consistent with published BESII results. These mass and width values do not match with those of any known meson resonance.Comment: 5 pages, 3 figures, submitted to Chinese Physics

    Physicochemical property distributions for accurate and rapid pairwise protein homology detection

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection.</p> <p>Results</p> <p>We introduce a new method for feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. With this approach there is little computational cost associated with the transformation of the protein into feature space, and overall performance in terms of remote homology detection is comparable with current state-of-the-art methods. We demonstrate that the features can be used for the task of pairwise remote homology detection with improved accuracy versus sequence-based methods such as BLAST and other feature-based methods of similar computational cost.</p> <p>Conclusions</p> <p>A protein feature method based on physicochemical properties is a viable approach for extracting features in a computationally inexpensive manner while retaining the sensitivity of SVM protein homology detection. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for applications such as large database searches and comparative genomics.</p

    MiR-223 Suppresses Cell Proliferation by Targeting IGF-1R

    Get PDF
    To study the roles of microRNA-223 (miR-223) in regulation of cell growth, we established a miR-223 over-expression model in HeLa cells infected with miR-223 by Lentivirus pLL3.7 system. We observed in this model that miR-223 significantly suppressed the proliferation, growth rate, colony formation of HeLa cells in vitro, and in vivo tumorigenicity or tumor formation in nude mice. To investigate the mechanisms involved, we scanned and examined the potential and putative target molecules of miR-223 by informatics, quantitative PCR and Western blot, and found that insulin-like growth factor-1 receptor (IGF-1R) was the functional target of miR-223 inhibition of cell proliferation. Targeting IGF-1R by miR-223 was not only seen in HeLa cells, but also in leukemia and hepatoma cells. The downstream pathway, Akt/mTOR/p70S6K, to which the signal was mediated by IGF-1R, was inhibited as well. The relative luciferase activity of the reporter containing wild-type 3′UTR(3′untranslated region) of IGF-1R was significantly suppressed, but the mutant not. Silence of IGF-1R expression by vector-based short hairpin RNA resulted in the similar inhibition with miR-223. Contrarily, rescued IGF-1R expression in the cells that over-expressed miR-223, reversed the inhibition caused by miR-223 via introducing IGF-1R cDNA that didn't contain the 3′UTR. Meanwhile, we also noted that miR-223 targeted Rasa1, but the downstream molecules mediated by Rasa1 was neither targeted nor regulated. Therefore we believed that IGF-1R was the functional target for miR-223 suppression of cell proliferation and its downstream PI3K/Akt/mTOR/p70S6K pathway suppressed by miR-223 was by targeting IGF-1R

    PDNAsite:identification of DNA-binding site from protein sequence by incorporating spatial and sequence context

    Get PDF
    Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community

    Disease-Related Cardiac Troponins Alter Thin Filament Ca2+ Association and Dissociation Rates

    Get PDF
    The contractile response of the heart can be altered by disease-related protein modifications to numerous contractile proteins. By utilizing an IAANS labeled fluorescent troponin C, , we examined the effects of ten disease-related troponin modifications on the Ca2+ binding properties of the troponin complex and the reconstituted thin filament. The selected modifications are associated with a broad range of cardiac diseases: three subtypes of familial cardiomyopathies (dilated, hypertrophic and restrictive) and ischemia-reperfusion injury. Consistent with previous studies, the majority of the protein modifications had no effect on the Ca2+ binding properties of the isolated troponin complex. However, when incorporated into the thin filament, dilated cardiomyopathy mutations desensitized (up to 3.3-fold), while hypertrophic and restrictive cardiomyopathy mutations, and ischemia-induced truncation of troponin I, sensitized the thin filament to Ca2+ (up to 6.3-fold). Kinetically, the dilated cardiomyopathy mutations increased the rate of Ca2+ dissociation from the thin filament (up to 2.5-fold), while the hypertrophic and restrictive cardiomyopathy mutations, and the ischemia-induced truncation of troponin I decreased the rate (up to 2-fold). The protein modifications also increased (up to 5.4-fold) or decreased (up to 2.5-fold) the apparent rate of Ca2+ association to the thin filament. Thus, the disease-related protein modifications alter Ca2+ binding by influencing both the association and dissociation rates of thin filament Ca2+ exchange. These alterations in Ca2+ exchange kinetics influenced the response of the thin filament to artificial Ca2+ transients generated in a stopped-flow apparatus. Troponin C may act as a hub, sensing physiological and pathological stimuli to modulate the Ca2+-binding properties of the thin filament and influence the contractile performance of the heart

    The Worker Honeybee Fat Body Proteome Is Extensively Remodeled Preceding a Major Life-History Transition

    Get PDF
    Honeybee workers are essentially sterile female helpers that make up the majority of individuals in a colony. Workers display a marked change in physiology when they transition from in-nest tasks to foraging. Recent technological advances have made it possible to unravel the metabolic modifications associated with this transition. Previous studies have revealed extensive remodeling of brain, thorax, and hypopharyngeal gland biochemistry. However, data on changes in the abdomen is scarce. To narrow this gap we investigated the proteomic composition of abdominal tissue in the days typically preceding the onset of foraging in honeybee workers
    corecore